π node [[epsilon_greedy_policy|epsilon greedy policy]]
Welcome! Nobody has contributed anything to 'epsilon_greedy_policy|epsilon greedy policy' yet. You can:
-
Write something in the document below!
- There is at least one public document in every node in the Agora. Whatever you write in it will be integrated and made available for the next visitor to read and edit.
- Write to the Agora from social media.
-
Sign up as a full Agora user.
- As a full user you will be able to contribute your personal notes and resources directly to this knowledge commons. Some setup required :)
β₯
related node [[epsilon_greedy_policy]]
β₯
node [[epsilon_greedy_policy]] pulled by Agora
π
garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Epsilon_Greedy_Policy.md by @KGBicheno
epsilon greedy policy
Go back to the [[AI Glossary]]
#rl
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise. For example, if epsilon is 0.9, then the policy follows a random policy 90% of the time and a greedy policy 10% of the time.
Over successive episodes, the algorithm reduces epsilonβs value in order to shift from following a random policy to following a greedy policy. By shifting the policy, the agent first randomly explores the environment and then greedily exploits the results of random exploration.
π stoas
- public document at doc.anagora.org/epsilon_greedy_policy|epsilon-greedy-policy
- video call at meet.jit.si/epsilon_greedy_policy|epsilon-greedy-policy
π full text search for 'epsilon_greedy_policy|epsilon greedy policy'